unit 4

Unit 4: Advanced NLP Techniques and Transformer Models

Recurrent Neural Networks (RNNs)

What is RNN

Recurrent Neural Network (RNN) is a type of neural network designed to work with sequence data.

Sequence data means data where order matters.

Examples:

RNN remembers previous information while processing new input.

So, it is useful for language tasks.


Main Idea of RNN

RNN processes data step by step.

At each step, it uses:

This helps RNN understand context.

Example:

In sentence
"I am learning AI"

To understand "AI", the model remembers "I am learning".


Working of RNN

RNN has a loop structure.

Output depends on:

So information flows from past to present.


Advantages and Limitations

Advantages:

Limitations:


Long Short-Term Memory (LSTM)

What is LSTM

Long Short-Term Memory (LSTM) is a special type of RNN.

It is designed to solve the memory problem of RNN.

LSTM can remember important information for a long time.


Why LSTM is Needed

Normal RNN forgets old information quickly.

Example:

In a long paragraph, RNN may forget the starting words.

LSTM solves this by using memory cells and gates.


Main Components of LSTM

LSTM has three main gates:

  1. Forget Gate
    Decides what information to remove

  2. Input Gate
    Decides what new information to store

  3. Output Gate
    Decides what information to send out

These gates control memory flow.


Working of LSTM

LSTM keeps a memory cell.

It updates this memory using gates.

Important information is kept.
Unimportant information is removed.

This makes LSTM powerful for long texts.


Advantages and Limitations

Advantages:

Limitations:


Implementation using Keras and TensorFlow

Keras and TensorFlow

TensorFlow is a deep learning framework.

Keras is a high-level library built on TensorFlow.

They are used to build neural networks easily.


Purpose of Using Keras and TensorFlow

They help to:


Basic Steps of Implementation

  1. Import libraries

  2. Prepare text data

  3. Convert text to numbers (tokenization)

  4. Build RNN/LSTM model

  5. Compile model

  6. Train model

  7. Test model


Example Use

Using Keras, we can easily create:


Introduction to Transformers (BERT, GPT)

What is Transformer

Transformer is a modern deep learning model used in NLP.

It does not use RNN or LSTM.

Instead, it uses attention mechanism.

Attention helps the model focus on important words.


Main Idea of Transformers

Transformers process all words at the same time.

They find relationships between words using attention.

So they are:


Attention Mechanism

Attention tells the model:

“Which words are important for this word?”

Example:

"I went to bank to deposit money"

Attention helps understand that "bank" means financial bank.


BERT and GPT

BERT:

GPT:

Both are based on transformers.


Advantages of Transformers


Pre-trained Models and Fine-Tuning

Pre-trained Models

Pre-trained models are models trained on huge datasets.

They already know:

Examples:


Why Pre-trained Models are Used

Training from scratch needs:

Pre-trained models save time and money.


Fine-Tuning

Fine-tuning means:

Using a pre-trained model and training it again on your own data.

Only small changes are made.

So the model learns your specific task.


Process of Fine-Tuning

  1. Load pre-trained model

  2. Add task-specific layer

  3. Train on new dataset

  4. Adjust weights slightly

  5. Test performance


Advantages of Fine-Tuning


Comparison of RNN, LSTM, and Transformers

Key Differences

Feature RNN LSTM Transformer
Memory Short Long Very Long
Speed Slow Slower Fast
Structure Sequential Sequential Parallel
Accuracy Medium High Very High
Used Today Rare Limited Very Common

One-Line Summary for Exam

RNN processes sequential data using memory, LSTM improves RNN by storing long-term information, transformers use attention for better context understanding, and pre-trained models with fine-tuning provide high accuracy with less data.


Memory Shortcut

RNN → Basic memory
LSTM → Strong memory
Transformer → Attention power
Pre-trained → Ready model
Fine-tuning → Customize model


BERT and GPT in Detail (Transformer-Based Models)

Both BERT and GPT are advanced NLP models based on the Transformer architecture.
They are called Large Language Models because they are trained on very large text data.

They understand language using attention mechanism instead of RNN or LSTM.


BERT (Bidirectional Encoder Representations from Transformers)

What is BERT

BERT is a Transformer-based model designed mainly for understanding text.

Full form:
Bidirectional Encoder Representations from Transformers

Meaning:

So, BERT understands words using left and right context together.


Key Idea of BERT (Bidirectional Reading)

Traditional models read text in one direction.

Example sentence:

"I went to the bank to deposit money"

Unidirectional model:
Reads only from left side.

BERT:
Reads from both sides.

So it knows:

"bank" is related to "deposit" and "money"

This gives better understanding.


Architecture of BERT

BERT uses:

Structure:

Input → Embedding → Encoder Layers → Output

It does not use decoder.

So BERT is mainly for analysis and understanding, not generation.


Input Format of BERT

Before sending text to BERT, it is converted into special format.

Example:

[CLS] I love machine learning [SEP]

Where:

[CLS] → Classification token
[SEP] → Separator token

BERT uses these tokens internally.


Pre-Training of BERT

BERT is trained using two main tasks.

1. Masked Language Model (MLM)

Some words are hidden.

Example:

"I love [MASK] learning"

Model predicts missing word.

Output:
"machine"

This helps BERT learn deep meaning.


2. Next Sentence Prediction (NSP)

Two sentences are given.

Model predicts:
Are they related or not?

Example:

Sentence A: I am studying NLP
Sentence B: It is very interesting

Related → Yes

This helps in question answering and reasoning.


Working Principle of BERT

  1. Input sentence is tokenized

  2. Converted into embeddings

  3. Passed through encoder layers

  4. Attention connects related words

  5. Final vectors represent meaning

Each word gets a context-aware vector.


Applications of BERT

BERT is mainly used for:

It is best for tasks where understanding is important.


Advantages of BERT


Limitations of BERT


GPT (Generative Pre-trained Transformer)

What is GPT

GPT is a Transformer-based model designed mainly for text generation.

Full form:
Generative Pre-trained Transformer

Meaning:

So GPT is mainly used for writing and generating language.


Key Idea of GPT (Unidirectional Reading)

GPT reads text from left to right only.

Example:

"I am learning artificial intelligence"

GPT predicts:

"I" → "am" → "learning" → "artificial" → "intelligence"

One word at a time.

This is called autoregressive modeling.


Architecture of GPT

GPT uses:

Structure:

Input → Embedding → Decoder Layers → Output

It does not use encoder.

So GPT focuses on generation.


Working Principle of GPT

GPT learns:

Given previous words, predict next word.

Example:

Input: "India is a"

Output: "country"

Then:

"India is a country"

Next prediction: "in"

This continues.


Pre-Training of GPT

GPT is trained using:

Language Modeling Task

Formula:

Predict next word:

P(wₙ | w₁, w₂, ..., wₙ₋₁)

Meaning:
Probability of next word depends on previous words.

It reads billions of sentences and learns patterns.


Training Method of GPT

Step 1: Read huge text data
Step 2: Learn grammar and structure
Step 3: Learn writing style
Step 4: Learn reasoning patterns

This makes GPT a general-purpose model.


Fine-Tuning of GPT

After pre-training, GPT is fine-tuned for:

Fine-tuning adapts GPT to specific tasks.


Applications of GPT

GPT is used for:

It is best for creative and interactive tasks.


Advantages of GPT


Limitations of GPT


Comparison of BERT and GPT

Key Differences

Feature BERT GPT
Direction Both sides Left to right
Architecture Encoder Decoder
Main Use Understanding Generation
Best For Analysis tasks Writing tasks
Output Labels, answers Text

BERT vs GPT in Simple Words

BERT is good at:

Reading and understanding

GPT is good at:

Writing and generating

BERT = Reader
GPT = Writer


Role in Modern NLP

Today, most advanced NLP systems use:

Many hybrid models combine both ideas.

Example:
T5, BART, PaLM


One-Line Summary for Exam

BERT is a bidirectional transformer model used for understanding text, while GPT is a unidirectional transformer model used for generating human-like language.